The Effect of Anaphor and Ellipsis Resolution on Proximity Searching in a Text Database

نویسندگان

  • Ari Pirkola
  • Kalervo Järvelin
چکیده

So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval (IR) of scientific abstracts. No significant improvement has been observed. In this study, the effects of ellipsis and anaphor resolution on proximity searching in a full text database are analyzed. Anaphora and ellipses are classified on the basis of the type of their correlates / antecedents rather than, as traditional, on the basis of their own linguistic type. The classification differentiates proper names and common nouns of basic words, compound words, and phrases. The study was carried out in a newspaper article database containing 55.000 full text articles. A set of 154 keyword pairs in different categories was created. Human resolution of keyword ellipses and anaphora was performed to identify sentences and paragraphs which would match proximity searches after resolution. The findings indicate that ellipsis and anaphor resolution is most relevant for proper name phrases and only marginal in the other keyword categories. Therefore the recall effect of restricted resolution of proper name phrases only was analyzed for keyword pairs containing at least one proper name phrase. Our findings indicate a recall increase of 38.2 % in sentence searches, and 28.8 % in paragraph searches when proper name ellipses were resolved. The recall increase was 17.6 % in sentence searches, and 10.8 % in paragraph searches when proper name anaphora were resolved. This suggests that some simple and computationally justifiable resolution method might be developed only for proper name phrases to support keywordbased full-text IR. Elements of such a method are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammar for ellipsis resolution in Japanese

This paper elucidates the linguistic mechanisms for resolving ellipsis (zero anaphor). The mechanisms consist of three tiers of linguistic system. [1] Japanese sentences are structured in such a way to anchor the topic, which is predominantly the subject (by Sentence devices), [2] with argument inferring cues on the verbal predicate (by Predicate devices), and [3] are cohesively sequenced with ...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

A High-Performance Coreference Resolution System using a Constraint-based Multi-Agent Strategy

This paper presents a constraint-based multiagent strategy to coreference resolution of general noun phrases in unrestricted English text. For a given anaphor and all the preceding referring expressions as the antecedent candidates, a common constraint agent is first presented to filter out invalid antecedent candidates using various kinds of general knowledge. Then, according to the type of th...

متن کامل

بررسی عوامل سازنده ابهام در مقالات شمس با تأکید بر مسئله انسجام دستوری

Maqālāt Shams as an eminent and significant work in the history of mysticism and Persian literature, which has a close relation with Rumi’s life and work, not only ignord by common readers but also research scholars have not paid attention to it properly. One of the main reasons of this, seems to be scattered sentences and lack of apparent firmness of the text, which have caused it appears a...

متن کامل

ParseTalk about Textual Ellipsis

orated model of functional preferences on C f elements which constrains the set of possible antecedents according to topic/comment patterns. 8 Conclusion In this paper, we have outlined a model of text ellipsis parsing. It considers conceptual criteria to be of primary importance and provides a proximity measure in order to assess various possible antecedents for consideration of proper bridges...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 32  شماره 

صفحات  -

تاریخ انتشار 1996